{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Partial Dependence Plots\n\n\nPartial dependence plots show the dependence between the target function [2]_\nand a set of 'target' features, marginalizing over the values of all other\nfeatures (the complement features). Due to the limits of human perception, the\nsize of the target feature set must be small (usually, one or two) thus the\ntarget features are usually chosen among the most important features.\n\nThis example shows how to obtain partial dependence plots from a\n:class:`~sklearn.neural_network.MLPRegressor` and a\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor` trained on the\nCalifornia housing dataset. The example is taken from [1]_.\n\nThe plots show four 1-way and two 1-way partial dependence plots (omitted for\n:class:`~sklearn.neural_network.MLPRegressor` due to computation time). The\ntarget variables for the one-way PDP are: median income (`MedInc`), average\noccupants per household (`AvgOccup`), median house age (`HouseAge`), and\naverage rooms per household (`AveRooms`).\n\n.. [1] T. Hastie, R. Tibshirani and J. Friedman, \"Elements of Statistical\n       Learning Ed. 2\", Springer, 2009.\n\n.. [2] For classification you can think of it as the regression score before\n       the link function.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(__doc__)\n\nfrom time import time\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom mpl_toolkits.mplot3d import Axes3D\n\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.preprocessing import QuantileTransformer\nfrom sklearn.pipeline import make_pipeline\n\nfrom sklearn.inspection import partial_dependence\nfrom sklearn.inspection import plot_partial_dependence\nfrom sklearn.experimental import enable_hist_gradient_boosting  # noqa\nfrom sklearn.ensemble import HistGradientBoostingRegressor\nfrom sklearn.neural_network import MLPRegressor\nfrom sklearn.datasets import fetch_california_housing"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "California Housing data preprocessing\n-------------------------------------\n\nCenter target to avoid gradient boosting init bias: gradient boosting\nwith the 'recursion' method does not account for the initial estimator\n(here the average target, by default)\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "cal_housing = fetch_california_housing()\nX = pd.DataFrame(cal_housing.data, columns=cal_housing.feature_names)\ny = cal_housing.target\n\ny -= y.mean()\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1,\n                                                    random_state=0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Partial Dependence computation for multi-layer perceptron\n---------------------------------------------------------\n\nLet's fit a MLPRegressor and compute single-variable partial dependence\nplots\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"Training MLPRegressor...\")\ntic = time()\nest = make_pipeline(QuantileTransformer(),\n                    MLPRegressor(hidden_layer_sizes=(50, 50),\n                                 learning_rate_init=0.01,\n                                 early_stopping=True))\nest.fit(X_train, y_train)\nprint(\"done in {:.3f}s\".format(time() - tic))\nprint(\"Test R2 score: {:.2f}\".format(est.score(X_test, y_test)))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We configured a pipeline to scale the numerical input features and tuned the\nneural network size and learning rate to get a reasonable compromise between\ntraining time and predictive performance on a test set.\n\nImportantly, this tabular dataset has very different dynamic ranges for its\nfeatures. Neural networks tend to be very sensitive to features with varying\nscales and forgetting to preprocess the numeric feature would lead to a very\npoor model.\n\nIt would be possible to get even higher predictive performance with a larger\nneural network but the training would also be significantly more expensive.\n\nNote that it is important to check that the model is accurate enough on a\ntest set before plotting the partial dependence since there would be little\nuse in explaining the impact of a given feature on the prediction function of\na poor model.\n\nLet's now compute the partial dependence plots for this neural network using\nthe model-agnostic (brute-force) method:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print('Computing partial dependence plots...')\ntic = time()\n# We don't compute the 2-way PDP (5, 1) here, because it is a lot slower\n# with the brute method.\nfeatures = ['MedInc', 'AveOccup', 'HouseAge', 'AveRooms']\nplot_partial_dependence(est, X_train, features,\n                        n_jobs=3, grid_resolution=20)\nprint(\"done in {:.3f}s\".format(time() - tic))\nfig = plt.gcf()\nfig.suptitle('Partial dependence of house value on non-location features\\n'\n             'for the California housing dataset, with MLPRegressor')\nfig.subplots_adjust(hspace=0.3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Partial Dependence computation for Gradient Boosting\n----------------------------------------------------\n\nLet's now fit a GradientBoostingRegressor and compute the partial dependence\nplots either or one or two variables at a time.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"Training GradientBoostingRegressor...\")\ntic = time()\nest = HistGradientBoostingRegressor()\nest.fit(X_train, y_train)\nprint(\"done in {:.3f}s\".format(time() - tic))\nprint(\"Test R2 score: {:.2f}\".format(est.score(X_test, y_test)))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Here, we used the default hyperparameters for the gradient boosting model\nwithout any preprocessing as tree-based models are naturally robust to\nmonotonic transformations of numerical features.\n\nNote that on this tabular dataset, Gradient Boosting Machines are both\nsignificantly faster to train and more accurate than neural networks. It is\nalso significantly cheaper to tune their hyperparameters (the default tend to\nwork well while this is not often the case for neural networks).\n\nFinally, as we will see next, computing partial dependence plots tree-based\nmodels is also orders of magnitude faster making it cheap to compute partial\ndependence plots for pairs of interacting features:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print('Computing partial dependence plots...')\ntic = time()\nfeatures = ['MedInc', 'AveOccup', 'HouseAge', 'AveRooms',\n            ('AveOccup', 'HouseAge')]\nplot_partial_dependence(est, X_train, features,\n                        n_jobs=3, grid_resolution=20)\nprint(\"done in {:.3f}s\".format(time() - tic))\nfig = plt.gcf()\nfig.suptitle('Partial dependence of house value on non-location features\\n'\n             'for the California housing dataset, with Gradient Boosting')\nfig.subplots_adjust(wspace=0.4, hspace=0.3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Analysis of the plots\n---------------------\n\nWe can clearly see that the median house price shows a linear relationship\nwith the median income (top left) and that the house price drops when the\naverage occupants per household increases (top middle).\nThe top right plot shows that the house age in a district does not have\na strong influence on the (median) house price; so does the average rooms\nper household.\nThe tick marks on the x-axis represent the deciles of the feature values\nin the training data.\n\nWe also observe that :class:`~sklearn.neural_network.MLPRegressor` has much\nsmoother predictions than\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor`. For the plots to be\ncomparable, it is necessary to subtract the average value of the target\n``y``: The 'recursion' method, used by default for\n:class:`~sklearn.ensemble.HistGradientBoostingRegressor`, does not account\nfor the initial predictor (in our case the average target). Setting the\ntarget average to 0 avoids this bias.\n\nPartial dependence plots with two target features enable us to visualize\ninteractions among them. The two-way partial dependence plot shows the\ndependence of median house price on joint values of house age and average\noccupants per household. We can clearly see an interaction between the\ntwo features: for an average occupancy greater than two, the house price is\nnearly independent of the house age, whereas for values less than two there\nis a strong dependence on age.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "3D interaction plots\n--------------------\n\nLet's make the same partial dependence plot for the 2 features interaction,\nthis time in 3 dimensions.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig = plt.figure()\n\nfeatures = ('AveOccup', 'HouseAge')\npdp, axes = partial_dependence(est, X_train, features=features,\n                               grid_resolution=20)\nXX, YY = np.meshgrid(axes[0], axes[1])\nZ = pdp[0].T\nax = Axes3D(fig)\nsurf = ax.plot_surface(XX, YY, Z, rstride=1, cstride=1,\n                       cmap=plt.cm.BuPu, edgecolor='k')\nax.set_xlabel(features[0])\nax.set_ylabel(features[1])\nax.set_zlabel('Partial dependence')\n#  pretty init view\nax.view_init(elev=22, azim=122)\nplt.colorbar(surf)\nplt.suptitle('Partial dependence of house value on median\\n'\n             'age and average occupancy, with Gradient Boosting')\nplt.subplots_adjust(top=0.9)\n\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}